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INSTALLING A DATA DICTIONARY 


During the 1960s, the most computerized application systems 
were designed with their own data files. Both the data and the re- 
sponsibility for the data were fragmented—and many of those sys- 
tems are still operating today. During the 1970s, one major trend 
has been toward the use of the shared-data data base that serves 
multiple applications. Numerous benefits are being obtained from 
data bases, including the reduction of undesired redundancies and 
incompatibilities of data. But at the same time, data bases have 
brought into focus the need for assigned responsibility and control 
of the organization’s data resources. The data administrator func- 
tion is being set up and given this responsibility. And data diction- 
aries are being installed as tools for the data administrators to help 
them perform this function. Here are some user experiences in 
putting in data dictionary systems. 


D ata dictionaries are systems and procedures— 
either manual or automated—for the storing and 
handling of an organization’s data definitions. In 
theory, their reason for existence need have noth- 
ing to do with computers. Well-managed organi- 
zations should have clearly specified definitions 
for all of the data items used by the organizations, 
according to theory. But this is not the way things 
have worked out in practice. 

In practice, the units of an organization—for in- 
stance, the departments—have developed systems 
and procedures to meet their needs, over the 
years. Each organizational unit has developed its 
own data definitions. Only when certain data 
items must be regularly exchanged among organi- 


zational units have standard definitions emerged. . 


The data definitions for financial data are the ones 
most frequently subject to standardization, since 
financial data is collected from all organizational 
units. 7 

The result has been that, in most organizations, 


the same data items have been given different 
names and different definitions by the different 
units of those organizations. Similarly, the same 
names have often been given to quite different 
data items. Because of this, management has 
found it very hard to compare data from the vari- 
ous units. 

A data dictionary might very well be a paper- 
based system for pulling together all of the data 
definitions used by the units of an organization, in 
an attempt to reduce undesired redundancy and 
remove inconsistencies. Or a data dictionary 
could just as well be a computerized system to do 
this same function. There are numerous other 
functions that a data dictionary might perform, as 


we will discuss. 


One thing stands out in the above discussion. A 
data dictionary does not, in itself, generally pro- 
duce operational data for the organization. It 
does not produce invoices, or job orders, or prod- 
uct catalogs, or such. Instead, its purpose is more 
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to eliminate the errors of understanding, the am- 
biguities, and the difficulties in interpreting the 
data. As such, a data dictionary is an overhead 


item. And unless the costs of those errors, ambi- ° 


guities, and difficulties are clearly evident, it 
may be hard to justify the installation of a data 
dictionary. 

The installation of computerized data bases 
and data base management systems has been 
bringing into focus the need for data dictionaries. 
Our discussion will center on mechanized data 
dictionaries that support computerized systems. 
Let us now look at some of the advantages and 
some of the problems associated with mechanized 
data dictionaries. 


Advantages of a dictionary 


Lefkovits (Reference 1), Leong-Hong and Mar- 
ron (Reference 2), Ehrensberger (Reference 3), 
Schussel (Reference 5), and Eppacs (Reference 6) 
provide good discussions of the advantages of 
data dictionaries, from which the following is 
drawn. 


Control of the data. By identifying the differ- 
ent definitions of the same data item, through the 
use of a data dictionary, an organization can 
standardize on one definition for that data item. 
Further, the accepted definitions for all data 
items will be found in the dictionary. Hence, un- 
intentional redundancies will be largely elimi- 
nated. Further, discipline will be imposed on the 
introduction of new data items or changes to ex- 
isting data definitions. 

The data dictionary can be used as an enforce- 
ment tool. It can help enforce the use of the stand- 
ard data definitions. It can be used to enforce 
security safeguards, both for data entities within 
the data bases as well as for the data dictionary it- 
self. And it can be used to enforce the use of stand- 
ard edit and validation routines. 

In addition, the data dictionary can be used as 
an audit tool. It can help auditors—internal audi- 
tors, external auditors, system reviewers, etc.— 
gain an understanding of the systems to be au- 
dited. Further, the dictionary can provide the 
data standards against which the audits are to be 
performed. | 


Improved system development and control. By 
providing centrally maintained data definitions, 
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the data dictionary can support programming 
standards and naming standards. Both systems de- 
sign and programming can be done faster because 
the data is already defined. System and program 
maintenance are made easier because complete 
documentation of the data is provided. By dis- 
playing all uses of a given data item, the full im- 
pact of a proposed change can be assessed. 
Further, the time that will be required to make 
the change can be more accurately determined. 

By providing both user-oriented and com- 
puter-oriented definitions for each data item, the 
data dictionary can help provide better commu- 
nication between users and the system devel- 
opment staff, 


Automatic generation capability. The data 
dictionary can be interfaced (bridged) to other 
software systems, so as to automatically generate 
input for those systems. Program data definitions 
(such as Cogpor data divisions) can be supplied by 
the dictionary, as can program input-output area 
definitions. The definitions required by the data 
dictionary associated with the data base manage- 
ment system can be automatically generated. The 
dictionary can generate reports about the data 
definitions, as well as documentation about the 
data base. 

All in all, then, a data dictionary offers many 
potential benefits for the user. But all is not milk- 
and-honey. There are some problems, too. 


Problems with dictionaries 


Perhaps the first problem that confronts the po- 
tential user of a data dictionary might be termed 
the what-where-when problem. What will it be 
used for? Where will it be obtained from? When 
(in relation to other activities) will it be installed? 

Here are some of the factors that have to be 
considered in making these decisions. Basic type 
of dictionary: will the dictionary be manual or 
automated? As mentioned above, we are assum- 
ing that an automated dictionary is being consid- 
ered. However, many of the same considerations 
apply to manual data dictionaries. Uses: will the 
dictionary be used to support system devel- 
opment, system maintenance, productive use of 
the data base, or a combination of these? What 
type of mechanized dictionary? must the diction- 
ary be independent of any particular data base 
management system, or may it require the use of a 


specific ppMs? Source of the dictionary: will the 
dictionary be purchased or will it be developed 
in-house? Timing of the installation: will the dic- 
tionary be installed prior to the installation of a 
DBMS or after it? Scope of the project: will the dic- 
tionary be used for all data, forms, and processes 
used by the organization, or will it be limited to 
just the mechanized data, or will it apply only to 
data base data? 

Hopefully, this brief listing of factors will give 
some idea of the inherent complexity of the what- 
where-when problem. It is true, of course, that 
many prospective users will make the same types 
of decisions that numerous dictionary users have 
made in the past. These are the so-called “prac- 
tical” decisions; for example: ““We want an auto- 
mated dictionary to support our current data base 
applications; we do not care if it uses the same 
DBMS that we are using or not, but we will buy it, 
not build it; we want it to support the productive 
use of our data bases, and we will use it for data 
base data only.” As “‘practical”’ as these decisions 
seem to be, we will attempt to show in this report 
that the planning for a data dictionary really 
ought to take a much broader view. 


Lefkovits (Reference 1) points out some other 
problems associated with data dictionaries. In or- 
der to achieve the benefits, the user organization 
must have (a) a good degree of commitment by 
management, users, and data processing per- 
sonnel, (b) an effective data administration func- 
tion, and (c) an effective method for planning the 
introduction of change into information systems. 

Even if these conditions are met, says Lefko- 
vits, the characteristics of the data dictionary se- 
lected can lead to installation problems. For 
instance, some dictionaries require the use of 
short (too short, in Lefkovits’ opinion) data 
names, while others allow for longer “natural”’ 
names. Some support mechanized data in conven- 
tional files (tape or disk) much better than others. 
Some use free-form commands while others use 
fixed-form. In addition to the ability to define data 
entities, some systems also allow the definition of 
process entities (systems, programs, or modules), 
and usage entities (users, terminals, etc.). These 
are only a very few of the significant differences 
pointed up by Lefkovits. If you are considering a 
data dictionary, or are considering replacing your 
current one, we strongly urge you to read Refer- 
ence I. 
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Field experience problems 


We talked to Keith Setzer, an executive at Uni- 
versity Computing Company, Dallas, Texas, 
about field experiences with data dictionaries. 
UCC has developed and is marketing the widely- 
used UCC 10 data dictionary system. We asked 
Setzer about some of the things to do and not to 
do, when installing and using a data dictionary, 
based on what his company had encountered. 

But first, a few words of background about 
UCC 10. UCC obtained IBM’s ims data base man- 
agement system in 1969, for in-house use. Very 
soon, they saw a need for a data dictionary, to 
help them get their data definitions under control. 
The UCC 10 was developed in 1970. Originally it 
was a batch system, in which the data definitions 
were printed out. These printouts could be quite 
voluminous—and expensive, when an entire 
printout had to be obtained just to get one 
change. So in 1972, an on-line query and update 
facility was added. The on-line query ability is 
available for both production and test status data, 
but on-line update is limited to test status data 
definitions only. In 1973, UCC started marketing 
UCC 10. It is designed for mms data bases and 
works with ims and 1Ms/vs. So Setzer’s comments 
were based not only on what UCC representa- 
tives had observed in the field but also upon 
UCC’s own use. 


Situations that typically lead to problems. Set- 
zer identified three situations that typically lead 
to serious problems or to outright failures to in- 
stall a data dictionary. A political ploy: The acqui- 
sition of the data dictionary may be a part of an 
actual or an apparent political ploy. Perhaps the 
reason given for the acquisition is that the com- 
puter center is running short of disk space and 
wants a dictionary to help eliminate redundant 
data. But user departments may see the acquisi- 
tion of the dictionary as one more effort to gain 
more control—in this case, over the data defini- 
tions. The users may resist the dictionary, in or- 
der to retain control over “‘their own” data 


definitions. 
Huge clean-up problem: Setzer pointed out 


that some organizations have been using a DBMS 
for five or six years, mainly as an access method 
rather than as a true pBMs. Many data base appli- 
cations are running, but with little integration 
among them. Much redundancy and many incon- 


sistencies exist in the data definitions, and there is 
little or no current documentation of those defini- 
tions. No data administration function has been 
established. In such a situation, the organization 
has a huge clean-up job ahead of it, if it wants to 
use a data dictionary effectively. 

Budget pressure: If the organization’s financial 
problems lead to significant budget reductions, 
one of the first things that probably would be cut 
would be the data dictionary project. As we 
pointed out earlier, a data dictionary is an over- 
head item, as is the data administrator function. 
Neither can stand up well in the face of budget 
cuts. 

(We did hear a report that some companies are 
finding that this function reduces system analysis 
and design times and costs sufficiently to cover 
the costs of the function. But we have not talked 
to users on this point.) 


A not-bad situation for a dictionary. On the 
other hand, said Setzer, if the organization ob- 
tained its ppMs about a year previously, and had 
developed one pilot application and had put it in 
to production, the chances of success for a dic- 
tionary might be much improved. Such an organi- 
zation might well see the need for controlling the 
data definitions. In fact, they might want to install 
the data dictionary before putting on more DBMs 
applications. 


The best situation for a dictionary. Ideally, 
said Setzer, the data dictionary should be consid- 
ered at least as soon as the pps is considered. If 
the organization is willing to consider setting up 
the data administration function, developing data 
standards, and installing the data dictionary as the 
first data base application, then the chances of 
success are much enhanced. 


The importance of data administration 


The effective use of a data dictionary really re- 
quires an effective data administration (or data 
base administration) function, said Setzer. Here 
are some of the things that the function must be 
able to do, in his view. 


Clean up the data definitions. The data admin- 
istration function must have the responsibility 
and authority to get rid of undesired definition re- 
dundancies and inconsistencies. The organization 
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should not allow two or more names to exist for 
the same data item, nor should it allow the same 
name to be used for two or more different data 
items. It may take the data administrator some 
time to accomplish this, of course. 


Control “corporate” data definitions. While 
data that is used solely by one organizational unit 
might be considered as “local” and under the con- 
trol of that unit, data used by two or more units 
should be considered as “corporate.” The data ad- 
ministration function should be able to control 
the acceptance of and perform the updating of all 
“corporate” data definitions. 


Control the changes to “corporate” data deft- 
nitions. The data administration function should 
be charged with analyzing the impact of all pro- 
posed changes to “corporate” data definitions. All 
programs that would have to be changed should 
be identified before approval to make the change 
is given. A data dictionary can be of tremendous 
help here, said Setzer. It provides one place to 
look for all uses of the data. Otherwise it may be 
necessary to get printouts of all program and data 
base data definitions and to manually scan them. 
Finally, approval to proceed with the change 
might be held up until all affected programs have 
been changed, to keep those applications from 
aborting. 


Oversee some security functions. The data ad- 
ministrator must be able to assign security levels 
and “need to know” clearances for the use of both 
the data dictionary and the data bases. Even if a 
user is entitled to access certain data in the data 
base, the access might be limited to read-only, as 
opposed to update. Further, the data adminis- 
trator might be the only one allowed to update 
the data dictionary; all changes to definitions 
would have to go through this function. 

To illustrate how an organization can approach 
the installation of a data dictionary in a desirable 
manner, let us consider the experience of Macy’s 
California. 


Macy’s California 


Macy’s California Division, with headquarters 
in San Francisco, Calif., is a division of the R. H. 
Macy Inc., a large department store chain. For- 
tune magazine ranks Macy’s as 27th in sales vol- 


ume among U.S. retailers, with sales of over $1.4 
billion annually. The corporation has about 
38,000 employees. 

Macy's California has 20 stores throughout 
central California. The data processing for these 
stores is performed at the division’s computer 
center in San Francisco. The division is using an 
IBM 370/158 and there are 28 people on the de- 
velopment staff. 

While Macy’s California is not yet using a data 
base management system, data processing man- 
agement recognized several years ago that a DBMS 
probably would be a desirable acquisition. But it 
was also recognized that it would be helpful to in- 
stall a data dictionary system in order to get the 
data definitions and documentation under con- 
trol. So management decided that the data dic- 
tionary would be considered ahead of the pps. A 
project was set up in mid-1976 to investigate and 
evaluate alternative data dictionaries. The assist- 
ant manager of system development was assigned 
to the project. 

A policy decision made early in the project 
shaped the whole study. This decision was that 
the data dictionary should not influence the 
choice of the pps. If a data dictionary works 
with only one psMs, it was not to be considered. 
Also, Macy’s wanted a dictionary package that 
could handle conventional tape and disk files as 
well as data base data definitions. A number of 
leading data dictionaries were thus eliminated 
from consideration. 

After looking at the major data dictionaries on 
the market, two prime candidates emerged. Both 
were studied in some detail, and users of both 
were visited and interviewed. About two months 
after the project started, the DATAMANAGER data 
dictionary, developed and marketed by MSP, Inc. 
(of Lexington, Massachusetts) was selected for de- 
tailed evaluation. 

Having selected what seemed to them to be the 
most suitable data dictionary for their needs, 
Macy’s decided to use the package’s trial period 
to find out just how well in fact it met their needs. 
They also wanted to find out what would be re- 
quired of them in order to make effective use of 
the dictionary. 

The first step in this evaluation was to select 
one existing application system and put all of the 
data definitions from that application into the dic- 
tionary. In performing this step, they encoun- 
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tered the problems of loading a dictionary. Some 
of the data definitions could be obtained auto- 
matically from Coso. programs. But these defini- 
tions were not complete, as far as the dictionary 
was concerned. And in some instances, the defini- 
tions had to be changed. What Macy’s learned 
from this exercise was that it is not wise to jump 
too fast into the automatic loading of a dictionary. 

Another experiment was to test the interfaces 
between DATAMANAGER and the Panva.et library 
system they use. The data definitions that were 
picked up automatically from the CosoL pro- 
grams were outputted to the library system. From 
this test, they obtained some idea of the type of 
operation they could plan on. 

Macy’s California also went through the exer- 
cise of developing “‘integrated”’ data definitions, 
for a few of the more widely used data elements. 
For instance, such terms as “store,” “depart- 
ment,’ and “class” are widely used within a mul- 
tiple branch department store. The experiment 
found a number of variations among the existing 
definitions, and then attempted to obtain standard 
definitions that were acceptable to the user de- 
partments, system analysts, and programmers. 
This test pointed out to them just what would be 
needed in order to develop naming conventions 
and standards. 

There were a number of features of DaTAMANa- 
GER that Macy’s liked particularly. Since this dic- 
tionary interfaces with a number of the leading 
DBMS, they liked the freedom of choice it provides 
them in this selection. They like the fact that it al- 
lows for defining process entities as well as data 
entities. They like the dummy entity capability, 
which allows setting up the definitions for a new 
entity as the need becomes apparent but before 
very much is known about those definitions. 

There were also some features that Macy’s 
found they wanted in their data dictionary. One 
was the facility to handle “clerk” entities in much 
the same manner as “‘program” entities. For each 
“clerk” entity, the definitions would indicate the 
inputs used, files accessed, logic of processing, and 
the outputs generated. Other users of DaTAMANA- 
GER were asking for this same feature, so MSP has 
implemented it by allowing other than standard 
member type names. For instance, instead of us- 
ing the normal system-program-item hierarchy 
for entities, one can use a company-department- 
clerk hierarchy, or some other such hierarchy. 


Macy’s believes that such facilities make the dic- 
tionary easier to use for defining all data, process, 
and usage entities. 

By the end of the trial use period, the people at 
Macy’s California had decided that they liked 
DaTAMANAGER and that it would serve their needs 
well. So the dictionary was purchased. But at the 
same time, they had found a number of activities 
that should be done in preparation for installing 
the dictionary. One of these activities was to de- 
velop the division’s standard data definitions. 
When we visited them, they were in the midst of 
this activity. When all of these necessary prepara- 
tory activities have been accomplished, the dic- 
tionary will be installed. 

And after the dictionary has been installed, the 
people at Macy’s California expect to consider 
the question of what pps they want to use. 


THE ROLE OF THE DATA 
DICTIONARY 


There are two major objectives for installing a 
data dictionary system. These are: 
¢ Getting all data and process definitions under 
control 
¢ Getting just mechanized data definitions un- 
der control 
We will discuss each of these briefly. 


Getting all data definitions under control 


Some organizations see the data dictionary as a 
tool with which the data administrator function 
can get all of the data, process, and usage defini- 
tions under control. 

Just what does this objective really imply? Ehr- 
ensberger (Reference 3) says that a data diction- 
ary should be able to contain information about 
the following: data bases, files, fields, transactions, 
source documents, reports, systems, programs, 
users, departments, projects, standards, security 
levels, and personnel. | 

Another interpretation of this objective is that 
the dictionary should be able to store and handle 
the definitions for the organization’s data (in filing 
cabinets, on forms, in conventional mechanized 
files, in data bases, etc.), the organization’s proc- 
esses (used by clerks, managers, programs, etc.), 
the organization’s systems, (data processing, word 
processing, data communications, etc.), and the 
organization's users of these entities (people, de- 
partments, divisions, etc.). 
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At first, such objectives might appear to be al- 
most unattainable. But, in fact, they are not at all 
impractical. As a matter of fact, there is one data 
dictionary on the market—PripE-Logik—that is 
specifically designed to help an organization get 
started in this direction. To illustrate, consider the 
experience of Marathon Oil Company. 


Marathon Oil Company 

Marathon Oil Company, with headquarters in 
Findlay, Ohio, is a fully integrated oil company. 
It has annual sales in excess of $3 billion and em- 
ploys almost 12,000 people worldwide. For its 
data processing, Marathon uses an IBM 370/168 
and the Tora data base management system. 

1974 was a difficult year for the data processing 
department at Marathon Oil. It was a period of 
rapid growth in the system development staff 
and, to compound the problem, it was a period of 
significant staff turnover. The result was that 
there were many new people added to handle the 
growing system development workload. The new 
people were using a variety of techniques and 
practices. Projects were slipping behind schedule 
and data processing management was finding it 
hard to know the status of each project. 

In looking for a solution to these problems, data 
processing management at Marathon came across 
PripE, developed and marketed by M. Bryce & 
Associates, Inc., Cincinnati, Ohio. We discussed 
PRIDE in our December 1974 issue. PRIDE is a 
structured system development methodology 
that uses nine well-defined phases. All of the nec- 
essary work products are defined for each phase. 
Development projects that are conducted under 
PrIDE can thus progress in a standard, controlled 
manner. Marathon liked the ideas they found in 
PrivE and purchased it in late 1974. 

From its inception, PripE has incorporated a 
complete data management philosophy. That is, 
all data definitions and process definitions are 
captured that fall within the scope of the appli- 
cation system being built. Initially, this function 
was performed manually. In 1974, it was mecha- 
nized under the name Pripe-Logik. PraipEe-Logik 
thus provides the data dictionary function for 
PRIDE, operating in a batch mode. However, it is 
not necessary to use PRIDE in order to use PRIDE- 
Logik. 

During 1975 and 1976, Marathon was well sat- 
isfied with their use of Prive. So in early 1976, 


they decided to obtain and install PripE-Logik, so 
as to upgrade the data management function for 
system development and maintenance. They also 
set up the data administration function (which 
they call the “data manager” function) under the 
systems Manager. 

(In describing their use of Pripg-Logik, the 
people at Marathon pointed out that their efforts 
have been most successful when they were deal- 
ing with new application systems that did not 
share data with existing application systems. The 
discussion that follows is based on such situations. 
However, where an application system is being 
developed that enmeshes with one or more exist- 
ing systems whose data definitions are not under 
PrivE-Logik, non-trivial problems do arise. “It is 
not all peaches and cream in such cases,”” we were 
told.) 

Use of Pripe-Logik. As indicated earlier, PRIDE 
itself has nine phases. Phases 1 to 3 cover the sys- 
tem study, system design, and sub-system design. 
Phases 4 to 6 cover the design of both adminis- 
trative (manual) and computer procedures, pro- 
gram design, and program test. Phases 7 to 9 
cover system testing, system operation, and sys- 
tem audit. 

At the end of the first phase, the system analysts 
at Marathon enter skeleton definitions of the new 
system into Pripe-Logik. These definitions are 
the first pass at the files, inputs, and outputs that 
will be needed. By the end of the second phase 
(system design), the analysts are able to enter nar- 
rative descriptions of the entities, plus rough for- 
mats of the system outputs to show to users. Also, 
additional definition information is entered for in- 
puts and files. 

Further definition information is entered dur- 
ing the third phase (sub-system design). By this 
point, the sequence in which the sub-systems and 
their files are to be processed can be determined. 
And during the design of the administrative 
procedures, the analysts can enter the definitions 
of the clerical documents and the data that will be 
on them. Also, at the end of phases 1, 2, and 3, 
PripE-Logik has the facility to perform system di- 
agnostics, such as checking all entities for com- 
pleteness and logical flow. 

Up to this point, Pripe-Logik has been used 
mainly as a repository of information. If there are 
questions and queries about the data definitions, 
these can be answered from the output documen- 


EDP ANALYZER, JANUARY 1978 


tation of the dictionary. But it is during the design 
of the computer procedures that Marathon begins 
to use the powerful analytical capabilities of 
PrivE-Logik. 

As computer procedure design starts, the pro- 
grammer(s) ask for printouts of all documentation 
applying to the programs they will be working 
on. This documentation covers record definitions, 
output definitions and formats, and the like. But 
in addition, they ask PripE-Logik to track all data 
elements, from their source to their ultimate use. 
The dictionary system flags all data fields that 
have been defined but not used, records specified 
to be used in a program but for which no input has 
been provided, and so on. In short, Pripe-Logik 
looks for inconsistencies in the definitions. Note 
that this analysis covers not only the mechanized 
part of the new application but also all of the 
manual procedures that provide input for it and 
use the outputs from it. 

If mistakes are found at this point, the analysts 
are called back in to make the corrections. The 
new definitions are entered at the sub-system de- 
sign point and all of the subsequent processing is 
repeated, to see if anything else comes to light. 
When the corrections pass all of the tests, the pro- 
grammers proceed with computer procedure 
design. 

At the end of the seventh phase (system test), 
and after the users have accepted the new system, 
all of the definitions are put under the control of 
the data administrator function. From that point 
on, changes to the definitions can be made only by 
submitting them through the data administrator 
function. In addition, all proposed changes are as- 
sessed for their total impact on the application 
system by using PripE-Logik. 

While Marathon Oil is pleased with their use of 
PrivE-Logik, they also point out that they are still 
in the process of learning to use it. In general, the 
system analysts provide input to the dictionary, 
for defining the new system, and the programmers 
use the outputs from the dictionary for designing 
and coding programs. Analysts find that they are 
entering information and retrieving some defini- 
tions but as yet are making little use of the analyt- 
ical powers. The programmers were using the 
analytical capabilities of the package but find that 
they have lost some of their previous data defini- 
tion perogatives. And the new function of data 
administration has taken away functions formerly 


performed by both the analysts and programmers. 

So, said data processing management at Mara- 
thon Oil, a tool as powerful as this data dictionary 
has many benefits but it also has some non-trivial 
side effects. It brings out into the open many 
differences of opinion between analysts, pro- 
grammers, and data administration. This is not 
due to characteristics of the package itself but 
rather is due to the fact that a new discipline is 
being imposed that changes the traditional ways 
of doing things. 

When we visited Marathon Oil, they were us- 
ing Pripe-Logik simply to support the appli- 
cation system development process. Sometime in 
the future, they expect to extend the use of the 
dictionary into daily operations, by means of in- 
terfaces with their PANVALET program library sys- 
tem and with their Tota data base management 
system. Further, they hope to use PripE-Logik as 
a repository of test cases flowing from users, pro- 
grammers, analysts, internal auditors, and data 
administration. 

It is clear, we think, that as more application 
systems are developed using PripE-Logik, and as 
major enhancements occur that use the same 
process, Marathon Oil will gradually get all of its 
data and process definitions into the data diction- 
ary. Further, this goal will be realized as a by- 
product of the company’s system development 
and maintenance process. 

Just in case the use of PripE-Logik may seem 
like a “special case” to you, consider the remarks 
made by Chris Gane at the DATAMANAGER Users 
Group conference held in New Orleans, Loui- 
siana, in May 1977. 


Remarks of Chris Gane 


Chris Gane, of Improved System Technologies, 
Inc., New York City, was an invited speaker at the 
above-mentioned users group meeting. The sub- 
ject of his talk was, “DATAMANAGER in a struc- 
tured analysis environment.” The paper is 
included in the proceedings of the meeting, Ref- 
erence 4. 

In his talk, Gane went through an example of 
the system building process, using an easily un- 
derstood application—that of a company that ad- 
vertises books for sale, receives mail orders from 
customers, and enters orders to publishers to re- 
plenish its inventory. Using a top-down approach, 
Gane gradually adds more detail into the appli- 
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cation. As he does so, he identifies and builds up 
data flows, definitions of data stores, and defini- 
tions of the data. Also, some of the logical proc- 
esses that will be used are defined. Then, says 
Gane, a logical data dictionary should be able to 
store and handle the definitions of data flows, data 
stores, data elements, and possibly some of the 
process logic. 

These are essentially the same entities that 
Marathon Oil is storing and handling in its data 
dictionary. 

Since Gane was addressing the DaraMANAGER 
Users Group, he assessed the capabilities of Data- 
MANAGER for performing these functions. Note 
that DATAMANAGER was originally developed to 
support a DBMS and conventional files in daily 
computer operations, and Gane was assessing 
its ability to support the system development 
process. 

DaTAMANAGER Clearly could handle the data 
element definitions and the data group defini- 
tions, said Gane, since that is what it was designed 
to do. If additional definitional information is de- 
sired to support the development process than is 
normally carried in the dictionary, it can be han- 
dled under the NoTE capability. 

Data flow definitions can also be handled, al- 
though not as conveniently, said Gane. These def- 
initions would include source, destination, 
description, data groups that are used, and so on. 
There seems to be no reason why current capabil- 
ities cannot be adapted by the user to this pur- 
pose, he said. 

Three other types of entities—process defini- 
tions, definitions of external entities (such as other 
systems, departments, etc. that are external to the 
system being developed), and glossary items—also 
can be handled by the dictionary. 

With such information in the dictionary, there 
are a number of outputs that the users would de- 
sire, said Gane. One is ordered listings of the en- 
tities, and another is cross-reference listings that 
show relationships between entities. Both of these 
capabilities are already available in DaTAMANA- 
GER, he said. Two other capabilities are desired 
but not yet available in the package. One is the 
ability to search the dictionary based on a key- 
word or character pattern. The second is con- 
sistency and completeness checking to see, for 
instance, if there are data flows without sources or 
destinations, or data elements in data stores that 


have not been entered via input. 

Is it practical to consider getting all of an or- 
ganization’s data and process definitions under 
control via the use of a data dictionary? Well, 
consider the user experiences that we have dis- 
cussed. DATAMANAGER and PriprE-Logik are two 
quite different data dictionary systems, originally 
designed with quite different goals in mind—the 
former mainly to support production and the lat- 
ter mainly to support development. But they 
seem to be tending in the same direction. PRIDE- 
Logik interfaces with several ppms,, for produc- 
tion use, and DATAMANAGER can support many de- 
velopment functions, as discussed. Yes, it is 
practical to consider such a goal. 


Getting mechanized data definitions 
under control 


The objective of getting only the mechanized 
data definitions, as opposed to all of the organiza- 
tion’s data definitions, under control is the more 
typical situation, from what we have observed. 
This goal has two subsidiary sub-goals: (a) “clean- 
ing up the mess” and (b) aiding end users. 


“Cleaning up the mess” 


In most organizations that we have talked to, 
the data definitions and process definitions are 
“not in good shape.” Typically, organizations 
have many conventional tape and disk files sup- 
porting numerous applications. Among these 
many files, the same basic data item might occur 
in two or more files, have two or more names, use 
different codes, have different field lengths, and so 
on. Even data bases are not immune to this dis- 
ease. Some organizations use many small data 
bases, and treat the DBMs as just another access 
method. In such situations, a data dictionary may 
be installed to help clean up and standardize the 
definitions. 

Lefkovits (Reference 1) identifies three kinds of 
“bridges” offered by some data dictionaries to 
help in this clean up function. One type of bridge 
is used to collect existing data definitions from ex- 
isting files, program data divisions, and pss di- 
rectories, and then load the dictionary with them. 
A second kind of bridge supplies data definitions 
to conventional files and their programs. And a 
third kind of bridge supplies data definitions to 
the data directory of a pps. Lefkovits points out 
that there are significant differences among the 
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dictionaries on the market as to the bridges they 
provide. 

These bridges can be used to help clean up the 
existing data definitions and to insure that the 
same practices are not continued in the future. 


Identification-of-problem mechanism. Using 
the bridge mechanism to load the dictionary with 
existing data definitions can assist in identifying 
the problems. All of the definitions are brought 
together at one point. Standard formats help in 
making comparisons. When undesired redun- 
dancies and inconsistencies are uncovered, the 
data administration function can work with users, 
analysts, programmers, and others to select stand- 
ard definitions that will be used in the future. 
These standard definitions can be used in all new 
application systems and can be incorporated in 
existing applications when other changes or en- 
hancements are made to them. 

As we pointed out earlier, this process probably 
should be done with only a portion of the total 
mechanized data definitions at a time. It might be 
relatively easy to load all current definitions into 
the dictionary’ via the bridge. But the magnitude 
of the “mess” might be so great that people are 
discouraged from even starting the clean up. 


Enforcement mechanism. As more and more 
standard data definitions are entered into the dic- 
tionary, the other two types of bridges can be 
used as a part of an enforcement mechanism. At 
some logical point in time, the policy can be 
adopted that henceforth all program data divi- 
sions, all input-output area definitions, and all 
DBMS directory inputs will be obtained via the 
dictionary. No direct inputs will be allowed. The 
dictionary can thus help insure that only the ap- 
proved data definitions are used. 

Lefkovits points out that, in the future, diction- 
aries might well be integrated with operating sys- 
tems and pps. All accesses to data would first 
flow through the dictionary, to pick up the data 
definitions. When a change has been made to a 
data definition, all programs using that data defi- 
nition would be prevented from running until any 
necessary changes had been made to them and 
they were once again “released” for production. 


Aid to end users 


A dictionary can assist end users in several 
ways. Users can look up the approved data defini- 


tions, in order to interpret data on reports. Users 
can also find out if particular types of data items 
in which they are interested are already in a data 
base, and if so, where. Users can find out what 
relationships have been defined among data 
items. 

“Users, in these examples, can be interpreted 
broadly. The term can apply to managers and 
other members of departments of the organiza- 
tion, company executives, staff members, auditors 
(both internal and external), as well as system ana- 
lysts and programmers. 

Field experience with dictionaries has shown 
that the on-line query capability is almost manda- 
tory for performing the above services. The prob- 
lem is, the definitions change. It can be quite 
expensive to order a printout of a large portion of 
the dictionary just to find a recently changed defi- 
nition. And if such a printout is not ordered, but 
instead some existing printout is consulted, there 
is no assurance that the information found is up to 
date. 

This access to the dictionary raises the prob- 
lems of security and integrity. Authorized access 
to both entries in the dictionary and to the data 
itself in the data base must be carefully defined. 
Some people will be allowed a read-only access, 
while others will have update privileges. 
Unauthorized accesses must be inhibited. We 
only mention these problems here; a discussion of 
them is beyond the scope of this report. 


SELECTING A DATA DICTIONARY 


The discussion so far hopefully has indicated 
some of the major features of data dictionaries 
that many users and potential users desire. These 
include the three types of bridges, effective secu- 
rity, integrity, and enforcement mechanisms, 
ability to support system development as well as 
daily operations, and so on. 

The point to be made here is that no current 
data dictionary has all of the desired features that 
we have discussed. In fact, it is not at all unusual 
for a company to begin investigating data diction- 
aries and find that all existing packages fall far 
short of its desires. The people making the study 
might then recommend to management that they 
be allowed to develop a much more desirable dic- 
tionary in-house. 

Our advice can be expressed in three words: 
don t do it. 
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To illustrate the problems that such a course of 
action can encounter, consider the experience of 
one company we talked to. This company had a 
staff member investigate dictionaries on the mar- 
ket; no “suitable” package was found, and a proj- 
ect to build a dictionary in-house was authorized. 

The initial version of the in-house dictionary 
was built. It did work and it did provide services 
that were not available on commercial packages. 
In fact, some of those services still are not avail- 
able on any commercial package. To help recoup 
some of the investment in the package, the com- 
pany offered it for sale to others, and some sales 
were made. And then calamity struck. The two 
key people who had developed the dictionary left 
the company. 

The company at that time was undergoing a 
big expansion in its data processing staff, to 
handle a much increased volume of system devel- 
opment. No one else was available with the inter- 
est and background to handle the dictionary 
project. The sales of the package were cancelled 
because there was no one available to help install 
the package. Some of the promised enhancements 
to the package, upon which the company was de- 
pending, could not be accomplished. So the proj- 
ect was officially stopped and the package was 
withdrawn from the market. 

This company is still using a very limited por- 
tion of the dictionary, but is now in the process of 
selecting a commercial package. Staff members 
still like the concepts upon which the in-house 
package was designed. But it turned out just not 
to be feasible to build, maintain, enhance, sell, 
and support the in-house package in the face of 
staff turnover and the other demands on the data 
processing department. 

We might go even further with our recommen- 
dation: do not even try to modify a commercial 
dictionary to better meet your needs. Much the 
same types of problems as those just discussed can 
arise when you try to maintain and enhance a 
modified package. It is much better to work 
through the users’ group for the package you se- 
lect, to find others interested in the same enhance- 
ments and to encourage the supplier to make 
those enhancements. 

What if you have other than IBM equipment? 
Some of the dictionaries on the market have been 
designed to run only on IBM equipment. Some 
others, while they might run on other brands of 
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equipment, so far have been implemented only 
for IBM equipment. With non-IBM equipment, 
the freedom of choice is very limited. There are at 
least two dictionaries on the market that have 
been written in ANS COBOL (Dara CATALOGUE and 
PripE-Logik) so as to run on other makes of equip- 
ment. In any case, it would be more desirable to 
try to get the supplier to adapt the package for 
your type of equipment (almost certainly at extra 
cost) than to try to do the job in-house. 


A preferred approach 


Early in this report, we gave a number of fac- 
tors involved in the decision to install a data dic- 
tionary. We also indicated that it was not unusual 
for companies to decide on some of these factors 
in a quick, “practical” manner. These “practical” 
decisions might be: install a data dictionary to 
support existing DBMs applications only; purchase 
the dictionary rather than build it; and it is imma- 
terial whether it depends on the ppMs or not. 

It seems to us that, if you are considering in- 
stalling a data dictionary, it should be recognized 
that sooner or later you will want your dictionary 
to cover all data and process definitions, not just 
for the mechanized portions of systems. This may 
or may not be a near-term goal for you. But if you 
can at least make progress toward this goal as a 
byproduct of your installation of a dictionary, 
that is all to the good. . 

So we suggest that you lay out a longer term 
goal, for the use of a data dictionary, and then lay 
out a program for reaching that goal by a series of 
stand-alone projects. Someday data dictionaries 
will be just as important as data base management 
systems. It is going to take a good number of years 
for any organization to use a dictionary really 
effectively. 

For instance, your first project might be to get 
data base data definitions cleaned up and under 
control. Once that has been done, your next proj- 
ect might be to get conventional file data defini- 
tions similarly cleaned up and under control. If 
the same data items appear both in the data bases 
and in the conventional files, this may mean some 
further clean up for those items. Your next project 
may be to use the data dictionary to support sys- 
. tem development, in a manner similar to what 
Marathon Oil is doing or to what Gane describes 
in his paper. These, of course, are only examples 
of a series of projects for getting data definitions 
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cleaned up and under control. 

With this series of projects in mind, you would 
be in a position to select your data dictionary. Be- 
fore making this selection, we suggest that you 
read all of the references listed at the end of this 
report. By all means, obtain and read Lefkovits’ 
book (Reference 1). He discusses the six leading 
data dictionaries currently on the U.S. market, 
with one chapter for each. These dictionaries are: 
Arthur Andersen’s Lexicon, Cincom Systems’ 
Data Dictionary, IBM’s pB/pc Data Dictionary, 
MSP’s DaTAMANAGER, Synergetic’s Data Cara- 
LoGuE, and UCC 10. After describing these in 
some detail, in a standard format to make com- 
parison easier, he analyzes and evaluates the six 
packages. There are substantial differences 
among these packages, but as Lefkovits says, 
there is no one best package. The “best”’ of the six 
for any particular company depends on that com- 
pany’s needs. 

The two evaluation checklists (Reference 7) 
should also prove very helpful in making the se- 
lection of a data dictionary. Each of these seems 
to pose questions that emphasize the strong points 
of its respective dictionary. But still, each raises a 
good number of valid points that ought to be con- 
sidered. And if the two were used together, that 
would tend to remove some of the product bias 
that exists. 

But the selection should be made, we think, 
with both the current and the future needs in 
mind, not just the current needs. 


Conclusion 


Based on the growing sales of the dictionary 
packages on the market, it would appear that a 
trend toward the use of data dictionaries is under- 
way. The task of effectively installing and using a 
dictionary is not an easy one. The decisions of 
what to do and what not to do should not be made 
casually. 

As we pointed out in this report, the data ad- 
ministration function and the data dictionary sys- 
tem are both overhead items. Further, they can 
entail a good amount of expense and effort. So 
both are quite vulnerable to budget cuts, com- 
pany politics, and arguments to bypass them for 
the sake of expediency. 

But the data administration function and the 
data dictionary system together can provide an 
effective way to clean up a company’s data and 
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process definitions and to keep them clean in the 
future. 

We have listed, earlier in this report, a number 
of the benefits that are offered by a data diction- 
ary, as well as some problems it can cause. The 
major benefit is hard to evaluate in monetary 
terms. A dictionary provides a means for greatly 
reducing undesired redundancies and inconsist- 
encies in the data definitions. With these undesir- 
ables reduced, both the operating people and the 
management of the organization should have 
fewer misunderstandings, fewer poor decisions 
that are based on misinterpreted data, and so on. 

In addition, we listed a number of more tan- 
gible benefits. System development times and 
costs should be reduced, because of the centrally 
maintained data definitions. Communication be- 
tween the development staff, users, management, 
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auditors, etc. is enhanced, leading to fewer errors 
and false starts in system development. Inputs for 
program data divisions and pps directories can 
be automatically generated. And there are other 
benefits which need not be repeated here. 

It is worth repeating as a final thought, though, 
the three conditions that Lefkovits feels must be 
met in order for an organization to achieve the 
benefits from a data dictionary system. These are: 
a good degree of commitment to the data diction- 
ary system, on the part of management, users, and 
data processing personnel; an effective data ad- 
ministration function that has the responsibility 
for and the custody of all data; and an effective 
method for planning and introducing change into 
information systems. 

We think that Lefkovits has summed up the sit- 
tuation pretty well. 
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